Index-based Similarity Search for Protein Structure Databases

نویسندگان

  • Orhan Çamoglu
  • Tamer Kahveci
  • Ambuj K. Singh
چکیده

We propose new methods for finding similarities in protein structure databases. These methods extract feature vectors on triplets of SSEs (Secondary Structure Elements) of proteins. The feature vectors are then indexed using a multidimensional index structure. Our first technique considers the problem of finding proteins similar to a given query protein in a protein dataset. It quickly finds promising proteins using the index structure. These proteins are then aligned to the query protein using a popular pairwise alignment tool such as VAST. We also develop a novel statistical model to estimate the goodness of a match using the SSEs. Our second technique considers the problem of joining two protein datasets to find an all-to-all similarity. Experimental results show that our techniques improve the pruning time of VAST 3 to 3.5 times, while keeping the sensitivity similar. Our technique can also be incorporated with DALI and CE to improve their running times by a factor of 2 and 2.7 respectively. The software is available online at http://bioserver.cs.ucsb.edu/.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Index-based Similarity Search for Protein Structure Databases

We propose two methods for finding similarities in protein structure databases. Our techniques extract feature vectors on triplets of SSEs (Secondary Structure Elements) of proteins. These feature vectors are then indexed using a multidimensional index structure. Our first technique considers the problem of finding proteins similar to a given query protein in a protein dataset. This technique q...

متن کامل

PSI: indexing protein structures for fast similarity search

MOTIVATION We consider the problem of finding similarities in protein structure databases. Current techniques sequentially compare the given query protein to all of the proteins in the database to find similarities. Therefore, the cost of similarity queries increases linearly as the volume of the protein databases increase. As the sizes of experimentally determined and theoretically estimated p...

متن کامل

MLR-Index: An Index Structure for Fast and Scalable Similarity Search in High Dimensions

High-dimensional indexing has been very popularly used for performing similarity search over various data types such as multimedia (audio/image/video) databases, document collections, time-series data, sensor data and scientific databases. Because of the curse of dimensionality, it is already known that well-known data structures like kd-tree, R-tree, and M-tree suffer in their performance over...

متن کامل

The Hybrid Tree: An Index Structure for High Dimensional Feature Spaces

Feature based similarity search is emerging as an important search paradigm in database systems. The technique used is to map the data items as points into a high dimensional feature space which is indexed using a multidimensional data structure. Similarity search then corresponds to a range search over the data structure. Although several data structures have been proposed for feature indexing...

متن کامل

A Novel Image Structural Similarity Index Considering Image Content Detectability Using Maximally Stable Extremal Region Descriptor

The image content detectability and image structure preservation are closely related concepts with undeniable role in image quality assessment. However, the most attention of image quality studies has been paid to image structure evaluation, few of them focused on image content detectability. Examining the image structure was firstly introduced and assessed in Structural SIMilarity (SSIM) measu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of bioinformatics and computational biology

دوره 2 1  شماره 

صفحات  -

تاریخ انتشار 2004